Skip to content

Conversation

@dgoodwin
Copy link
Contributor

@dgoodwin dgoodwin commented Dec 10, 2025

This is being done to track if we get better or worse and compare to past releases, will be backporting.

Also stopped generating metric endpoint down intervals if they overlap with node reboots. This should allow for more accurate tracking of this total.

@openshift-ci-robot
Copy link

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci openshift-ci bot requested review from deads2k and p0lyn0mial December 10, 2025 14:06
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 10, 2025
@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@dgoodwin
Copy link
Contributor Author

/label acknowledge-critical-fixes-only
/verified by dgoodwin

@openshift-ci openshift-ci bot added the acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. label Dec 10, 2025
@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Dec 10, 2025
@openshift-ci-robot
Copy link

@dgoodwin: This PR has been marked as verified by dgoodwin.

Details

In response to this:

/label acknowledge-critical-fixes-only
/verified by dgoodwin

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@dgoodwin
Copy link
Contributor Author

Couldn't repro in the PR but the files are there.

Only generate metrics down intervals if they do not overlap with node
reboots or updates.

Sum the total time we were in metrics endpoint down on any node with a
new generic monitortest for this purpose. Also sum high cpu intervals.

This will allow us to track if we're making things better with changes
and compare to past releases.
@dgoodwin dgoodwin force-pushed the kubelet-metrics-total-outage branch from aa24d89 to c3bdb3e Compare December 16, 2025 19:25
@dgoodwin
Copy link
Contributor Author

/retest

@dgoodwin
Copy link
Contributor Author

/pipeline required

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@dgoodwin dgoodwin changed the title Track the total kubelet metrics outage durations with autodl framework NO-JIRA: Track the total kubelet metrics outage durations with autodl framework Jan 6, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 6, 2026
@openshift-ci-robot
Copy link

@dgoodwin: This pull request explicitly references no jira issue.

Details

In response to this:

This is being done to track if we get better or worse and compare to past releases, will be backporting.

Also stopped generating metric endpoint down intervals if they overlap with node reboots. This should allow for more accurate tracking of this total.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@dgoodwin
Copy link
Contributor Author

dgoodwin commented Jan 6, 2026

/retest

@openshift-trt
Copy link

openshift-trt bot commented Jan 6, 2026

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New tests seen in this PR at sha: c3bdb3e

  • "[Monitor:interval-duration-sum][Jira:"Test Framework"] monitor test interval-duration-sum cleanup" [Total: 15, Pass: 15, Fail: 0, Flake: 0]
  • "[Monitor:interval-duration-sum][Jira:"Test Framework"] monitor test interval-duration-sum collection" [Total: 15, Pass: 15, Fail: 0, Flake: 0]
  • "[Monitor:interval-duration-sum][Jira:"Test Framework"] monitor test interval-duration-sum interval construction" [Total: 15, Pass: 15, Fail: 0, Flake: 0]
  • "[Monitor:interval-duration-sum][Jira:"Test Framework"] monitor test interval-duration-sum preparation" [Total: 15, Pass: 15, Fail: 0, Flake: 0]
  • "[Monitor:interval-duration-sum][Jira:"Test Framework"] monitor test interval-duration-sum setup" [Total: 15, Pass: 15, Fail: 0, Flake: 0]
  • "[Monitor:interval-duration-sum][Jira:"Test Framework"] monitor test interval-duration-sum test evaluation" [Total: 15, Pass: 15, Fail: 0, Flake: 0]
  • "[Monitor:interval-duration-sum][Jira:"Test Framework"] monitor test interval-duration-sum writing to storage" [Total: 15, Pass: 15, Fail: 0, Flake: 0]

TableName: "interval_duration_sum",
Schema: map[string]dataloader.DataType{
"IntervalSource": dataloader.DataTypeString,
"TotalDurationSeconds": dataloader.DataTypeFloat64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect the seconds to ever be fractional? Not a huge issue but curious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Int would definitely be cleaner. I'll update.

@openshift-ci-robot openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label Jan 7, 2026
@neisw
Copy link
Contributor

neisw commented Jan 7, 2026

/lgtm

/hold
just to validate the artifact gets picked up in ci-data-loader-test

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 7, 2026
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 7, 2026
TableName: "interval_duration_sum",
Schema: map[string]dataloader.DataType{
"IntervalSource": dataloader.DataTypeString,
"TotalDurationSeconds": dataloader.DataTypeInt64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whelp, looks like it is DataTypeInteger

@neisw
Copy link
Contributor

neisw commented Jan 7, 2026

/lgtm cancel

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 7, 2026
@dgoodwin dgoodwin force-pushed the kubelet-metrics-total-outage branch from 1d81129 to 63129f5 Compare January 7, 2026 20:14
@dgoodwin
Copy link
Contributor Author

dgoodwin commented Jan 7, 2026

/override ci/prow/okd-scos-images

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 7, 2026

@dgoodwin: Overrode contexts on behalf of dgoodwin: ci/prow/okd-scos-images

Details

In response to this:

/override ci/prow/okd-scos-images

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@neisw
Copy link
Contributor

neisw commented Jan 7, 2026

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 7, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 7, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgoodwin, neisw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@neisw
Copy link
Contributor

neisw commented Jan 8, 2026

/hold cancel

Verified the artifacts are imported properly

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 8, 2026
@openshift-trt
Copy link

openshift-trt bot commented Jan 8, 2026

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New tests seen in this PR at sha: 63129f5

  • "[Monitor:interval-duration-sum][Jira:"Test Framework"] monitor test interval-duration-sum cleanup" [Total: 12, Pass: 12, Fail: 0, Flake: 0]
  • "[Monitor:interval-duration-sum][Jira:"Test Framework"] monitor test interval-duration-sum collection" [Total: 12, Pass: 12, Fail: 0, Flake: 0]
  • "[Monitor:interval-duration-sum][Jira:"Test Framework"] monitor test interval-duration-sum interval construction" [Total: 12, Pass: 12, Fail: 0, Flake: 0]
  • "[Monitor:interval-duration-sum][Jira:"Test Framework"] monitor test interval-duration-sum preparation" [Total: 12, Pass: 12, Fail: 0, Flake: 0]
  • "[Monitor:interval-duration-sum][Jira:"Test Framework"] monitor test interval-duration-sum setup" [Total: 12, Pass: 12, Fail: 0, Flake: 0]
  • "[Monitor:interval-duration-sum][Jira:"Test Framework"] monitor test interval-duration-sum test evaluation" [Total: 12, Pass: 12, Fail: 0, Flake: 0]
  • "[Monitor:interval-duration-sum][Jira:"Test Framework"] monitor test interval-duration-sum writing to storage" [Total: 12, Pass: 12, Fail: 0, Flake: 0]

@dgoodwin
Copy link
Contributor Author

dgoodwin commented Jan 8, 2026

/verified by bigquery data being present

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jan 8, 2026
@openshift-ci-robot
Copy link

@dgoodwin: This PR has been marked as verified by bigquery data being present.

Details

In response to this:

/verified by bigquery data being present

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 0d04a81 and 2 for PR HEAD 63129f5 in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 8, 2026

@dgoodwin: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit f07be90 into openshift:main Jan 8, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants